7 research outputs found

    Versification and Authorship Attribution

    Get PDF
    The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poem’s author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr Plecháč asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, Plecháč distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato

    1800-luvun alun ”venĂ€lĂ€inen laulu” korpustutkimuksen valossa

    Get PDF
    In this article, ‘Russian songs’ from the beginning of the 19th century – i.e. imitations or ‘stylisations’ of non-ritual lyric Russian folksongs – are analysed using the methods of big data research. A corpus of ‘Russian songs’ is compared to corpora consisting of both folk songs and literary texts. The poetics of ‘Russian songs’, surprisingly enough, do not resemble the folk songs they are supposed to be imitating, and comes more close to the literary norms of their time.Artikkeli kĂ€sittelee ”venĂ€lĂ€isiĂ€ lauluja”, toisin sanoen rituaaleista irrallisen, lyyrisen venĂ€lĂ€isen kansanlaulun pastisseja. NĂ€mĂ€ laulelmat tai romanssit muistuttavat muodoltaan kansanlauluja, mutta ovat useimmiten yksittĂ€isenrunoilijan kĂ€sialaa. Pastissin ja jĂ€ljittelyn kohteen vĂ€listĂ€ suhdetta on huomattavasti vaikeampi kuvata teoreettisesti kuin kansanrunoudesta lainatun aineksen kĂ€yttöÀ kaunokirjallisuudessa yleensĂ€. Vastakkaisen suuntauksen tutkimus, eli tutkimus, jossa tarkastellaan kaunokirjallisten teosten adaptoitumista kansanrunoudeksi, on yleisellĂ€ tasolla auttanut ymmĂ€rtĂ€mÀÀn sanallisen kansanperinteen mekanismeja. TĂ€ssĂ€ artikkelissa lĂ€hestytÀÀn kuitenkin venĂ€lĂ€istĂ€ kirjallisuushistoriaa ja sen tyylivariaatioita korpusanalyysin keinoin. ”VenĂ€lĂ€isistĂ€ lauluista” koottua tekstikorpusta verrataan sekĂ€ kansanrunouden ettĂ€ kaunokirjallisuuden teksteistĂ€ koottuihin korpuksiin.Tyylimetriikan menetelmien avulla pyritÀÀn kuvaamaan pelkistetty malli, jossa nĂ€kyvĂ€t korpusten vastaavuus ja erot. NĂ€in voidaan lĂ€hestyĂ€ kvantitatiivisesti kansanrunouden elementtien valikoitumisen ja vĂ€littymisen ongelmaapastisseissa. Analyysi osoittaa, ettĂ€ ”venĂ€lĂ€isten laulujen” poetiikka ei muistuta imitoimiaan kansanlauluja, vaan on lĂ€hempĂ€nĂ€ aikansa yleisiĂ€ kaunokirjallisia normeja

    Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

    Full text link
    The paper discusses an approach to decipher large collections of handwritten index cards of historical dictionaries. Our study provides a working solution that reads the cards, and links their lemmas to a searchable list of dictionary entries, for a large historical dictionary entitled the Dictionary of the 17th- and 18th-century Polish, which comprizes 2.8 million index cards. We apply a tailored handwritten text recognition (HTR) solution that involves (1) an optimized detection model; (2) a recognition model to decipher the handwritten content, designed as a spatial transformer network (STN) followed by convolutional neural network (RCNN) with a connectionist temporal classification layer (CTC), trained using a synthetic set of 500,000 generated Polish words of different length; (3) a post-processing step using constrained Word Beam Search (WBC): the predictions were matched against a list of dictionary entries known in advance. Our model achieved the accuracy of 0.881 on the word level, which outperforms the base RCNN model. Within this study we produced a set of 20,000 manually annotated index cards that can be used for future benchmarks and transfer learning HTR applications

    Gyenge mƱfajok: a költƑi versmĂ©rtĂ©k Ă©s a jelentĂ©s közötti kapcsolat modellĂĄlĂĄsa az orosz költĂ©szetben

    Get PDF
    A dolgozat egy mĂĄr meglĂ©vƑ, „a versmĂ©rtĂ©k jelentĂ©smezƑjekĂ©nt” ismert költĂ©szetelmĂ©let formalizĂĄlĂĄsĂĄt kĂ­sĂ©rli meg, amely elmĂ©let azt ĂĄllĂ­tja, hogy a modern lĂ­ra kĂŒlönbözƑ metrikai formĂĄi bizonyos jelentĂ©sbeli asszociĂĄciĂłkat halmoznak fel Ă©s Ƒriznek meg. Az LDA tĂ©mamodellezƑ (topic modelling) algoritmussal vizsgĂĄltuk az orosz költĂ©szet tĂĄg korpuszĂĄt (1750–1950), hogy ezĂĄltal minden egyes verset egy tĂ©matĂ©rben, a versmĂ©rtĂ©keket pedig a tĂ©mĂĄk valĂłszĂ­nƱsĂ©gĂ©nek eloszlĂĄsa szerint reprezentĂĄljunk. Nem felĂŒgyelt osztĂĄlyozĂĄst Ă©s kiterjedt mintavĂ©telt alkalmazva megmutatjuk, hogy a verselĂ©si formĂĄkon belĂŒl Ă©s között erƑs a forma Ă©s a jelentĂ©s kapcsolata: ugyanahhoz a versmĂ©rtĂ©khez tartozĂł kĂ©t minta sokszor nagyon is hasonlĂłkĂ©nt tƱnik fel, Ă©s ugyanannak a csalĂĄdnak kĂ©t verselĂ©si formĂĄja legtöbbször szintĂ©n egy klaszterbe kerĂŒl. Ez a kapcsolat akkor is kimutathatĂł, ha a korpusz kronolĂłgiai szempontbĂłl ellenƑrzött, Ă©s nem következmĂ©nye a populĂĄciĂł mĂ©retĂ©nek. Amellett Ă©rvelĂŒnk, hogy hasonlĂł megközelĂ­tĂ©st nyelvek Ă©s költĂ©szeti hagyomĂĄnyok szemantikai mezƑinek összehasonlĂ­tĂĄsakor is alkalmazni lehet, amelynek rĂ©vĂ©n az irodalomtörtĂ©net legalapvetƑbb kĂ©rdĂ©seire adhatĂłk relevĂĄns vĂĄlaszok

    Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse

    Get PDF
    Recent advances in cultural analytics and large-scale computational studies of art, literature and film often show that long-term change in the features of artistic works happens gradually. These findings suggest that conservative forces that shape creative domains might be underestimated. To this end, we provide the first large-scale formal evidence of the persistent association between poetic meter and semantics in 18-19th European literatures, using Czech, German and Russian collections with additional data from English poetry and early modern Dutch songs. Our study traces this association through a series of clustering experiments using the abstracted semantic features of 150,000 poems. With the aid of topic modeling we infer semantic features for individual poems. Texts were also lexically simplified across collections to increase generalizability and decrease the sparseness of word frequency distributions. Topics alone enable recognition of the meters in each observed language, as may be seen from highly robust clustering of same-meter samples (median Adjusted Rand Index between 0.48 and 1). In addition, this study shows that the strength of the association between form and meaning tends to decrease over time. This may reflect a shift in aesthetic conventions between the 18th and 19th centuries as individual innovation was increasingly favored in literature. Despite this decline, it remains possible to recognize semantics of the meters from past or future, which suggests the continuity of semantic traditions while also revealing the historical variability of conditions across languages. This paper argues that distinct metrical forms, which are often copied in a language over centuries, also maintain long-term semantic inertia in poetry. Our findings, thus, highlight the role of the formal features of cultural items in influencing the pace and shape of cultural evolution

    Deep transitions: towards a comprehensive framework for mapping major continuities and ruptures in industrial modernity

    No full text
    The world is confronted by a socio-ecological emergency, requiring rapid and deep decarbonization of a broad range of socio-technical systems. A recent Deep Transitions framework argues that this fundamentally unsustainable trajectory has been generated by the co-evolutionary dynamics of multiple systems during the last 250 years. Altering this direction requires transformation in industrial modernity – a set of most fundamental ideas, institutions, and practices characterizing every industrial society to date. Although the proponents of the framework suggest that this shift has been unfolding since the 1960s, no attempts have been made to operationalize the concept of industrial modernity and to assess this claim. This paper develops a comprehensive multi-dimensional and multi-domain approach for the measurement of industrial modernity. As such it seeks to provide empirical evidence of long-term continuities and emerging ruptures in the dominant ideas, institutions, and practices of industrial societies along the domains of environment and technology. Using a methodologically novel approach where the text mining of newspapers is combined with data from various databases the paper provides results from three countries – Australia, Germany, Soviet Union/Russia – between 1900 and 2020. Despite considerable country-level differences the results show shifts in public environmental discourse from the 1960s, followed by institutional changes from the 1980s but with only a modest change in practices. We also observe some change in the direction of innovative activities and their regulation coupled with a resurgent optimism in technology-environment discourse. The findings tentatively suggest that industrial modernity might be in the process of hollowing out along ideational and institutional dimensions in the environmental domain but less so in the domain of technology and innovation

    CLS Infra Computational Literary Studies Infrastructure

    No full text
    Computational Literary Studies Infrastructure, funded by the Horizon2020 grant scheme, is a four-year, pan-European project that aims to unify the diverse landscape of computational text analysis, in terms of available texts, tools, methods, practices and so forth, within its growing international user community. The project started out in February 2021, meaning that it has been underway for just over a year. In our poster we discuss the various deliverables and activities that have come out of the CLS INFRA project in its first quarter to give an idea of its impact in practice
    corecore